Negation Naive Bayes for Categorization of Product Pages on the Web
نویسندگان
چکیده
We propose the negation naive Bayes (NNB): a new method to categorize product pages on the Web depending on their information. It is a modified version of the naive Bayes (NB) and we got the idea from the complement naive Bayes (CNB). We compared the NNB with the NB and the CNB. Our experiments show that the NNB outperformed the other methods significantly when the product pages were distributed non-uniformly through categories.
منابع مشابه
Cleaning Web Pages for Effective Web Content Mining
Classifying and mining noise-free web pages will improve on accuracy of search results as well as search speed, and may benefit webpage organization applications (e.g., keyword-based search engines and taxonomic web page categorization applications). Noise on web pages are irrelevant to the main content on the web pages being mined, and include advertisements, navigation bar, and copyright noti...
متن کاملFeature Selection for Web Page Classification
Web page classification is significantly different from traditional text classification because of the presence of some additional information, provided by the HTML structure and by the presence of hyperlinks. In this paper we analyze these peculiarities and try to exploit them for representing web pages in order to improve categorization accuracy. We conduct various experiments on a corpus of ...
متن کاملA Generic Approach for Web Page Classification Using URL’s Features Along With the Textual Content
Classification of web pages greatly helps in making the search engines more efficient by providing the relevant results to the user’s queries. In most of the prevailing algorithms available in literature, the classification/ categorization solely depends on the features extracted from the text content of the web pages. But as the most of the web pages nowadays are predominately filled with imag...
متن کاملIntegrating Multiple Internet Directories by Instance-based Learning
Finding desired information on the Internet is becoming increasingly difficult. Internet directories such as Yahoo!, which organize web pages into hierarchical categories, provide one solution to this problem; however, such directories are of limited use because some bias is applied both in the collection and categorization of pages. We propose a method for integrating multiple Internet directo...
متن کاملCatégorisation automatique de pages web chinoises - documents spécialisés vs grand public sur le tabagisme
Text categorization (or supervised classification) generally addresses the topic or the type of a text. We tackle here a different dimension, the intended audience, contrasting two broad categories: texts intended for the general public, or texts intended for specialists. We test the categorization, according to this contrast, of Chinese Web pages about smoking. In this context, we obtain the f...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011